In the previous tutorials we saw learned about the Feed Forward Neural Networks where each neuron in a network learns from every neuron in the previous layer. This architecture of the network assumes independence between the features of the data. This helps in learning the abstract global representation. However, this architure cannot learn local representations of the data because there is no interaction between the neighboring neurons. For Example we are analyzing an image or a video or a spech sequence we need to learn the local representations of the and build on those local representations to form a global representation.
We attain this function by imposing sparsity on the layers such that most of the weight are zeros exept few. We attain sparcity by forcig each neuron to learn only from new neighboring inputs. Let $k$ be number of neighnors we allow a neuron to learn from. If we take $k=3$ then each neuron only get input from the the 3 previous neurons. Forcibly reducing the inputs of the neuron to 3 will allow the neuron to learn the local representations of the image. For an image case that initial convolution layer learns small edges from the image. Another advantage of this technique is good old reusuability. Once is neuron is trained to learn an edge from an image, we can use the same neuron to detect the edge wherever it is present in the image by moving it around the entire image. We can see this in two ways we can think of this as a single neuron moving around the image to detect the features or many neurons sharing the same weights. This operaiton is called convolution.
Let's look at it mathematically. Let $x = [x^{(1)}, x^{(2)},......,x^{(d)}]$ d dimensional input vector and $W = [w_1, w_2,.......,w_3]$ be the weight vector of the neuron. This operation can be mathematically represented as $$z^{(j)} = \sum_{i=1}^{k}x^{j+i-1}W_k, \forall j=[1,2,β¦,πβπ+1] $$
The value k we used before that represents the number of inputs the neuron learns from can be seen as the number of signals it recieves from the previous layer. Therefore, this k is called the receptive field. It can also be viewed in a different way. If k is 3 then the neurons receptive field is 3 from the previous layer. What about the neuron's receptive field with respect to the layer before that? The previous layer has each neuron recieved from 3 inputs from the layer before. Thefore the current neuron recieves 9 inputs from the 2 layers before.
From the previous section we learned that each neuron recieves input from k neighboring k values in the input and produces one ouput. As the neuron moves in the entire layer, it produces d-k+1 outputs where d is the number of dimensions in the input. Each neuron produces d-k+1 oututs, so if we have n neurons in each layer then we have $n*(d-k+1)$ outputs. For Example for a 20 neuron layer, if the input is 100 dimensional, for k=3 we have 1960 outputs. from 100 to 1960 is a huge jump and this keeps on increasing and will result in a computational instability. To prevent that we have a special operation called pooling. We downsample the entire output space into lower dimension by taking a single value for a given pooling length. depending on the value we consider we call it either as avg pooling or max pooling.
We can reduce the number of outputs by using strides without using pooling. A stride operation is equivalent to a hop. Stride is the number of hops we make from convolution to convolution. We can use longer strides where we don't want outputs to read from common inputs.
We can add a convolution layer to our network using add_layer function with the following syntax:
net.add_layer ( type = "conv_pool",
origin = "input",
id = "conv_pool_1",
num_neurons =
filter_size = (,),
pool_size = (,),
activation = ('maxout', 'maxout', 2),
batch_norm = True,
regularize = True,
verbose = verbose
)
InΒ [8]:
from yann.network import network
from yann.utils.graph import draw_network
from yann.special.datasets import cook_mnist
def lenet5 ( dataset= None, verbose = 1, regularization = None ):
"""
This function is a demo example of lenet5 from the infamous paper by Yann LeCun.
This is an example code.
Warning:
This is not the exact implementation but a modern re-incarnation.
Args:
dataset: Supply a dataset.
verbose: Similar to the rest of the dataset.
"""
optimizer_params = {
"momentum_type" : 'nesterov',
"momentum_params" : (0.65, 0.97, 30),
"optimizer_type" : 'rmsprop',
"id" : "main"
}
dataset_params = {
"dataset" : dataset,
"svm" : False,
"n_classes" : 10,
"id" : 'data'
}
visualizer_params = {
"root" : 'lenet5',
"frequency" : 1,
"sample_size": 144,
"rgb_filters": True,
"debug_functions" : False,
"debug_layers": False, # Since we are on steroids this time, print everything.
"id" : 'main'
}
# intitialize the network
net = network( borrow = True,
verbose = verbose )
# or you can add modules after you create the net.
net.add_module ( type = 'optimizer',
params = optimizer_params,
verbose = verbose )
net.add_module ( type = 'datastream',
params = dataset_params,
verbose = verbose )
net.add_module ( type = 'visualizer',
params = visualizer_params,
verbose = verbose
)
# add an input layer
net.add_layer ( type = "input",
id = "input",
verbose = verbose,
datastream_origin = 'data', # if you didnt add a dataset module, now is
# the time.
mean_subtract = False )
# add first convolutional layer
net.add_layer ( type = "conv_pool",
origin = "input",
id = "conv_pool_1",
num_neurons = 20,
filter_size = (5,5),
pool_size = (3,3),
activation = ('maxout', 'maxout', 2),
# regularize = True,
verbose = verbose
)
net.add_layer ( type = "conv_pool",
origin = "conv_pool_1",
id = "conv_pool_2",
num_neurons = 50,
filter_size = (3,3),
pool_size = (1,1),
activation = 'relu',
# regularize = True,
verbose = verbose
)
net.add_layer ( type = "dot_product",
origin = "conv_pool_2",
id = "dot_product_1",
num_neurons = 1250,
activation = 'relu',
# regularize = True,
verbose = verbose
)
net.add_layer ( type = "dot_product",
origin = "dot_product_1",
id = "dot_product_2",
num_neurons = 1250,
activation = 'relu',
# regularize = True,
verbose = verbose
)
net.add_layer ( type = "classifier",
id = "softmax",
origin = "dot_product_2",
num_classes = 10,
# regularize = True,
activation = 'softmax',
verbose = verbose
)
net.add_layer ( type = "objective",
id = "obj",
origin = "softmax",
objective = "nll",
datastream_origin = 'data',
regularization = regularization,
verbose = verbose
)
learning_rates = (0.05, .0001, 0.001)
net.pretty_print()
draw_network(net.graph, filename = 'lenet.png')
net.cook()
net.train( epochs = (20, 20),
validate_after_epochs = 1,
training_accuracy = True,
learning_rates = learning_rates,
show_progress = True,
early_terminate = True,
patience = 2,
verbose = verbose)
print(net.test(verbose = verbose))
data = cook_mnist()
dataset = data.dataset_location()
lenet5 ( dataset, verbose = 2)